
Cocojunk
🚀 Dive deep with CocoJunk – your destination for detailed, well-researched articles across science, technology, culture, and more. Explore knowledge that matters, explained in plain English.
Cache (computing)
Read the original article here.
Understanding Cache in Computer Architecture: A Resource for Building from Scratch
Welcome to a key component in the performance of modern computers: the cache. As you embark on the journey of understanding and potentially building a computer from the ground up, comprehending how caches work is essential. They are not just abstract concepts but fundamental performance accelerators integrated at multiple levels of a computer system.
What is a Cache?
In simple terms, a cache is a smaller, faster storage component that holds copies of data from a larger, slower storage component. Its primary purpose is to serve future requests for that data faster than accessing the original source would allow.
Cache (pronounced 'KASH'): A hardware or software component that stores data so that future requests for that data can be served faster. The data stored in a cache is typically a copy of data residing in a slower 'backing store'.
Think of it like having a small desk (cache) next to a large filing cabinet (backing store). You pull frequently needed documents from the filing cabinet and keep them on your desk. When you need one of those documents again, it's much quicker to grab it from your desk than to walk back to the filing cabinet, open the drawer, and find it again.
Caches are ubiquitous in computing, found in hardware (like within the CPU, on hard drives, and network cards) and implemented in software (like in operating systems, web browsers, and databases).
The Motivation: Bridging the Performance Gap
Why are caches so important? The fundamental reason lies in the inherent trade-offs when designing memory and storage systems:
- Speed vs. Capacity: Faster memory technologies (like SRAM) are typically more expensive and consume more power, making them impractical for very large capacities. Slower technologies (like DRAM, Flash, and magnetic disks) offer much larger capacities at a lower cost per bit.
- Distance: Even if memory technologies were the same speed, larger capacity implies larger physical size. Signals need time to travel across wires. A component physically farther away takes longer to access.
- Technology Evolution: Processors (CPUs) have historically become much faster than the main memory technologies they interact with (like DRAM). This creates a significant performance gap. If the CPU had to wait for every piece of data from the slower main memory, it would spend most of its time idle, waiting.
Memory Hierarchy: Modern computer systems use a hierarchy of storage devices, ordered by speed, cost, and capacity. Faster, more expensive technologies (like CPU registers and L1 cache) are smaller and sit closer to the CPU, while slower, cheaper technologies (like main memory, SSDs, HDDs) are larger and further away. Caches exist between levels of this hierarchy to bridge the performance gap.
The cache mitigates the performance gap by exploiting a property of how computer programs typically access data, known as locality of reference:
- Temporal Locality: If a piece of data is accessed, it is likely to be accessed again soon. (e.g., a loop variable, frequently used instruction).
- Spatial Locality: If a piece of data is accessed, data located nearby in memory is likely to be accessed soon. (e.g., elements of an array accessed sequentially, instructions in a linear code path).
By storing recently or frequently used data, and data located near it, in a fast cache, subsequent accesses to this data can be served quickly from the cache, hiding the latency of accessing the slower backing store.
How Caches Operate
Understanding the core operation of a cache is crucial whether you're building hardware or software components that utilize or implement caching.
A cache is essentially a collection of entries. Each entry stores:
- Data: A copy of a block of data from the backing store.
- Tag: An identifier that specifies which data block from the backing store this cache entry is a copy of. This tag is necessary because the cache is much smaller than the backing store, so not every data block can be present simultaneously.
When a component (the "cache client") needs to access data from the backing store, it first checks the cache:
- The client presents the identifier of the data it needs (e.g., a memory address, a file path, a URL).
- The cache uses this identifier to look for a matching tag among its entries.
Based on the lookup result, one of two situations occurs:
Cache Hit
Cache Hit: Occurs when the requested data's tag is found in the cache, and the data stored in the cache entry is valid and up-to-date.
If a hit occurs, the cache can provide the requested data directly to the client. This is much faster than accessing the backing store.
Hit Rate (or Hit Ratio): The percentage of data access requests that result in a cache hit. A higher hit rate indicates better cache performance, as more requests are served quickly from the cache.
Example: Imagine a CPU needs data from memory address 0x1234
. It sends this request to the cache controller. The cache controller checks if it has an entry with a tag corresponding to the memory block containing 0x1234
. If it does, and the data is valid, it retrieves the data from the cache and sends it to the CPU immediately.
Cache Miss
Cache Miss: Occurs when the requested data's tag is not found in the cache, or the cache entry is invalid or out-of-date.
If a miss occurs, the requested data must be retrieved from the slower backing store. This is the slower path. Once the data is retrieved, it is typically copied into the cache, in anticipation that it will be needed again soon due to locality of reference.
Example: The CPU requests data from 0x5678
. The cache controller checks its tags but finds no entry corresponding to the memory block containing 0x5678
. This is a cache miss. The cache controller then initiates a read request to the main memory (the backing store) to fetch the data block containing 0x5678
. Once the data block arrives, the cache stores a copy of it, and then provides the requested data (0x5678
) to the CPU.
Replacement Policies
Since the cache is smaller than the backing store, a cache miss often requires bringing a new data block into the cache. If the cache is full, an existing entry must be removed to make space for the new data. The replacement policy is the algorithm or heuristic used to decide which existing entry to evict.
Replacement Policy: An algorithm used by a cache to decide which existing cache entry to remove when the cache is full and new data needs to be added (during a cache miss). The goal is to remove an entry that is least likely to be needed again soon, to maximize the hit rate.
Common replacement policies include:
- Least Recently Used (LRU): Evicts the entry that has not been accessed for the longest time. This policy leverages temporal locality, assuming that data not used recently is less likely to be used in the near future.
- Least Frequently Used (LFU): Evicts the entry that has been accessed the fewest times. This policy focuses on overall popularity.
- First-In, First-Out (FIFO): Evicts the entry that has been in the cache the longest. Simple to implement but doesn't consider usage patterns.
- Random: Evicts a randomly selected entry. Simple and avoids some pathological worst-case scenarios for other policies.
Choosing an effective replacement policy is crucial for cache performance and involves trade-offs between implementation complexity and potential hit rate improvement. Implementing LRU in hardware, for example, can be quite complex as it requires tracking the access time or order of every entry.
Write Policies
What happens when the client writes data instead of reading it? The modified data exists in the cache, but eventually, this change must be reflected in the backing store. The write policy determines when and how this synchronization occurs.
Write Policy: Determines how write operations are handled by the cache, specifically when modifications made in the cache are propagated to the backing store.
The two primary write policies are:
Write-Through: Data is written synchronously to both the cache and the backing store simultaneously.
- Pros: The backing store is always up-to-date, simplifying consistency (cache coherence) if other systems access the backing store directly. Write misses are simpler as the write goes straight to the backing store.
- Cons: Every write operation involves the latency of the backing store, potentially slowing down write performance compared to writing only to the fast cache.
Write-Back (or Write-Behind): Data is initially written only to the cache. The write to the backing store is delayed until the modified data block is about to be removed from the cache (evicted by the replacement policy) or explicitly flushed.
- Pros: Write operations are much faster as they only involve the cache's speed. Multiple writes to the same data block in the cache only require one write to the backing store when the block is eventually evicted.
- Cons: The data in the backing store is temporarily stale (out-of-date) compared to the data in the cache. This requires extra complexity to track modified blocks and manage consistency, especially if multiple caches or other systems access the same backing store.
Dirty Block: In a write-back cache, a cache entry that has been modified since it was loaded from the backing store is marked as 'dirty'. A dirty block must be written back to the backing store before it can be replaced by new data. This write-back process for a dirty block is often called a "lazy write".
During a write miss (the client tries to write data to a location not currently in the cache), the cache needs a write-miss policy:
Write-Miss Policy: Determines whether data is loaded into the cache on a write miss.
Write Allocate (or Fetch on Write): On a write miss, the data block containing the target location is first loaded into the cache (like a read miss), and then the write operation is performed on the newly loaded cache entry (now a write hit).
- Typically paired with Write-Back, as subsequent writes or reads to the same block will likely hit in the cache.
No-Write Allocate (or Write Around): On a write miss, the data is written directly to the backing store, bypassing the cache. The data block is not loaded into the cache.
- Typically paired with Write-Through, as there's no benefit to loading the block into the cache if subsequent writes still have to go to the backing store immediately.
Cache Coherence
When multiple entities (e.g., multiple CPU cores, or a CPU cache and a DMA controller) can access or cache the same data from a shared backing store, copies of the data might exist in different caches. If one entity modifies its copy, the copies in other caches become stale. Ensuring that all entities see a consistent view of the data is the problem of cache coherence.
Cache Coherence: A consistency problem that arises when multiple caches or entities hold copies of the same data. Cache coherence protocols are mechanisms used to ensure that all copies are consistent and that all entities see updates in a predictable manner.
Implementing cache coherence in multi-processor systems is a significant challenge in hardware design, involving complex protocols (like MESI, MOESI, etc.) to track the state of data blocks across different caches and trigger invalidations or updates when data is modified.
Prefetching
To further improve performance, some caches employ prefetching. This involves anticipating data needs and loading data into the cache before the client explicitly requests it.
Prefetching: A technique where data is loaded into the cache in anticipation of future requests, often based on observed access patterns (like sequential reads) or explicit hints.
Prefetching can hide the latency of accessing the backing store by making the data available in the cache when it's needed. However, incorrect prefetching can pollute the cache with data that isn't actually used, potentially evicting valuable data and reducing the hit rate.
Examples of Caches
Caches appear at many levels of a computing system, from the lowest hardware levels to high-level software applications. Understanding these examples illustrates the versatility and importance of the caching principle.
Hardware Caches (Critical for "From Scratch" Building)
Hardware caches are typically managed entirely by dedicated hardware logic and are designed for extremely high speed.
CPU Cache
These are the most performance-critical caches in a system. Modern CPUs include multiple levels of cache, often on the CPU die itself, operating at speeds close to the CPU core frequency.
- Levels: CPUs typically have an L1 cache (Level 1), L2 cache (Level 2), and sometimes L3 cache (Level 3), and even L4.
- L1 Cache: Smallest, fastest, closest to the CPU core. Often split into separate Instruction Cache (I-cache) for program instructions and Data Cache (D-cache) for data operands. Typically measured in tens of kilobytes (KB). Accessed in a few CPU cycles.
- L2 Cache: Larger and slightly slower than L1. May be unified (stores both instructions and data). Typically measured in hundreds of KB or a few megabytes (MB). Accessed in tens of CPU cycles.
- L3 Cache: Largest and slowest of the on-die caches. Usually unified and shared among multiple CPU cores. Typically measured in several MB. Accessed in tens to hundreds of CPU cycles.
- Motivation in this context: When building a CPU, you'd quickly realize that connecting directly to main DRAM memory for every instruction fetch or data access would make your fast CPU core wait constantly. On-die SRAM caches are essential to feed the CPU pipeline efficiently. You'd design cache controllers, define block sizes (cache lines), associativity (how data is mapped from main memory to cache slots), and implement replacement and write policies in hardware logic.
Translation Lookaside Buffer (TLB)
This is a specialized cache within the Memory Management Unit (MMU).
Translation Lookaside Buffer (TLB): A specialized, high-speed cache used by the Memory Management Unit (MMU) to store recent mappings between virtual memory addresses and physical memory addresses.
In systems with virtual memory, every memory access involves translating a virtual address generated by the CPU into a physical address that the memory hardware understands. This translation typically requires looking up page table entries stored in main memory. Accessing the page table on every memory operation would be prohibitively slow. The TLB caches these translations. A TLB hit means the translation is found instantly; a TLB miss requires a slower page table walk through main memory.
- Motivation in this context: When designing an MMU for your computer, you'll implement virtual memory. You'll quickly see the performance bottleneck of page table walks. A TLB is the standard hardware solution to cache these translations, making virtual memory practical.
GPU Cache
Graphics Processing Units (GPUs) initially had specialized caches (like texture caches) but have evolved to include more general-purpose caches for instructions and data, similar to CPUs, especially as they are used for general computation (GPGPU). These caches help manage the massive parallelism of GPUs efficiently.
DSP Cache
Digital Signal Processors (DSPs) also use caches, often including split instruction and data caches, similar to CPU designs, moving away from earlier architectures that relied more heavily on scratchpad memory.
Software Caches (Illustrating the Concept at Higher Levels)
Software caches are managed by software (like operating systems, libraries, or applications) and typically reside in main memory or even on disk. They operate at a higher level of abstraction than hardware caches.
Disk Cache (Operating System Page Cache)
The operating system kernel manages a large area of main memory (RAM) as a cache for data read from or written to slower storage devices like Hard Disk Drives (HDDs) or Solid State Drives (SSDs).
Page Cache (Operating System Disk Cache): A portion of main memory (RAM) managed by the operating system to cache data from disk drives or other block devices. It stores frequently accessed disk blocks to reduce the need for slow physical disk I/O.
When a program requests data from a file on disk, the OS first checks if that data block is in the page cache. If it is (a cache hit), the data is provided quickly from RAM. If it's a miss, the OS reads the block from disk, stores a copy in the page cache, and then gives the data to the program. Writes often go to the page cache first (write-back), and the OS periodically flushes dirty pages to disk.
- Motivation in this context: If you were building an operating system for your scratch-built computer, managing disk I/O would be a major challenge. Disk accesses are thousands to millions of times slower than CPU cycles. Implementing a page cache in RAM would be a crucial optimization to make file system access responsive.
Web Cache
Web browsers and web proxy servers store copies of downloaded web resources (HTML pages, images, stylesheets, etc.).
Web Cache: A software cache used by web browsers or proxy servers to store copies of web pages, images, and other web content. It reduces network traffic, server load, and improves page loading speed for repeat visits.
When you visit a website, your browser might check its local web cache to see if it already has a recent copy of the page or its resources. If so, it loads them from your fast local storage instead of downloading them again over the internet (the slower backing store).
- Motivation in this context: This demonstrates how the caching principle applies beyond hardware. The "backing store" is a remote web server over a slow, high-latency network connection. The "cache" is local storage (RAM or disk) on the client machine.
Memoization
Memoization is an optimization technique specifically for functions in programming.
Memoization: An optimization technique where the results of expensive function calls are stored in a lookup table (a cache) and returned immediately when the same inputs occur again.
If a function is called multiple times with the same arguments, and it's computationally expensive to re-calculate the result each time, memoization stores the result after the first computation. Subsequent calls with the same arguments can retrieve the result from the cache instead of re-executing the function body.
- Motivation in this context: This shows caching applied at the application or algorithmic level. The "computation" is the slow operation (the backing store), and the "cache" is an in-memory data structure storing results.
Content Delivery Network (CDN)
CDNs use a geographically distributed network of servers to cache copies of website content closer to users.
Content Delivery Network (CDN): A distributed network of servers that caches copies of website content (like images, videos, static files) at various locations around the world. Users are served content from the geographically nearest cache server, reducing latency and network load on the origin server.
When a user requests content from a website using a CDN, the request is directed to a server geographically close to the user that has a cached copy of the content. This significantly reduces the distance the data has to travel, improving loading speed.
- Motivation in this context: Another example of caching applied at a large, distributed scale to overcome network latency and origin server load.
Buffer vs. Cache: A Key Distinction
While often related, it's important to understand the fundamental difference between a buffer and a cache.
Buffer: A temporary storage area used to hold data while it is being transferred between two devices or processes that operate at different speeds or have different data handling requirements. Buffers smooth out data flow, match speeds, or assemble/disassemble data chunks.
Cache: A temporary storage area used to hold copies of data likely to be used again. Its primary goal is to reduce latency and improve performance for repeated accesses to the same data.
Here's a breakdown of their intent and use:
Primary Goal:
- Buffer: To facilitate data transfer, often between entities that cannot communicate directly or have different requirements (speed, chunk size, timing). It smooths flow and reduces the number of transfers by grouping small ones.
- Cache: To speed up subsequent accesses to data by storing copies in a faster location. It reduces the latency of repeated accesses.
Data Usage:
- Buffer: Data is typically written into a buffer once and read out of it once. It acts as a temporary holding area during transit.
- Cache: Data is intended to be read from the cache multiple times after being written/loaded once.
Data Presence:
- Buffer: The buffer holds the data instead of or before it goes to its final destination or process.
- Cache: The cache holds a copy of data that also exists in the backing store (or is the result of a computation that could be repeated).
Complexity:
- Buffer: Generally simpler. Doesn't necessarily need to track where the data came from in the backing store or manage complex coherence issues.
- Cache: More complex. Requires tracking tags to identify the original data, implementing replacement policies, write policies, and potentially cache coherence protocols.
Analogy:
- Buffer: Imagine a temporary staging area where goods are held as they are moved from a slow truck to a fast conveyor belt. The goods pass through the staging area once.
- Cache: Imagine a frequently used toolbox on a workbench. You pull tools from the main workshop (backing store) and keep the ones you use most often in the toolbox. When you need a tool, you check the toolbox first. If it's there (hit), great! If not (miss), you go to the workshop and maybe put it in the toolbox for next time.
In practice, caching systems often incorporate buffering techniques internally (e.g., writing a block to a write-back cache involves buffering the write before flushing). However, the core purpose and mechanism (tracking original source via tags, replacement policies based on usage) differentiate a cache from a simple buffer. Understanding this distinction is vital for designing systems where you need to manage data flow and optimize for performance.
Conclusion
Caching is a cornerstone of modern computer architecture and system design. From the critical, speed-of-light constraints of CPU caches to the distributed nature of web caches, the fundamental principle remains the same: exploit locality of reference by storing data copies closer to where they are needed, in faster storage.
As you delve into "The Lost Art of Building a Computer from Scratch," you will encounter caches at various stages. Designing or understanding even the simplest CPU cache, managing an operating system's disk cache, or implementing memoization in software requires a solid grasp of the concepts outlined here: the performance motivation, the hit/miss mechanism, replacement and write policies, and the challenges like cache coherence. Mastering these concepts is crucial for building efficient computing systems.